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ABSTRACT 

The computation of a structured canonical polyadic decompo¬ 
sition (CPD) is useful to address several important modeling 
problems in real-world applications. In this paper, we con¬ 
sider the identification of a nonlinear system by means of a 
Wiener-Hammerstein model, assuming a high-order Volterra 
kernel of that system has been previously estimated. Such 
a kernel, viewed as a tensor, admits a CPD with banded 
circulant factors which comprise the model parameters. To 
estimate them, we formulate specialized estimators based on 
recently proposed algorithms for the computation of struc¬ 
tured CPDs. Then, considering the presence of additive white 
Gaussian noise, we derive a closed-form expression for the 
Cramer-Rao bound (CRB) associated with this estimation 
problem. Pinally, we assess the statistical performance of 
the proposed estimators via Monte Carlo simulations, by 
comparing their mean-square error with the CRB. 

Index Terms — Tensor Decomposition, Structured CPD, 
Cramer-Rao bound, Wiener-Hammerstein model 

1. INTRODUCTION 

The canonical polyadic decomposition (CPD), which can be 
seen as one possible extension of the SVD to higher-order 
tensors IT], is by now a well-established mathematical tool 
utilized in many scientific disciplines E). As it requires only 
mild assumptions for being essentially unique, the CPD pro¬ 
vides means for blindly and jointly identifying the compo¬ 
nents of multilinear models, which arise in many real-world 
applications; see IBS] for some examples. 

In particular, the computation of CPDs having structured 
factors-such as Vandermonde, Toeplitz or Hankel matrices- 
has been shown useful in problems including channel esti¬ 
mation a, nonlinear system identification 0 and multidi¬ 
mensional harmonic retrieval 0. As a consequence, several 
special-purpose algorithms have been developed ||6H9l. 
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In practice, the data tensor to be decomposed is always 
corrupted by noise. Therefore, the assessment of the sta¬ 
tistical performance of CPD computation algorithms via 
comparison with the Cramer-Rao bound (CRB) moll is of 
practical interest, since it can guide the choice for an appro¬ 
priate algorithm in application domains. Eurthermore, it can 
provide valuable information for the study and development 
of such algorithms. Eor the unstructured CPD, ini has de¬ 
rived the associated CRB and presented an evaluation of the 
popular alternating least-squares (AES) algorithm for tensors 
of orders three and four. Regarding the structured case, the 
CRB for the estimation of a complex CPD with a particular 
Vandermonde factor has been derived in ifTTll . motivated by 
the problem of estimating the directions of arrival of multiple 
source signals. Also, ca has provided a closed-form expres¬ 
sion for the CRB associated with the estimation of a CPD 
having Hankel and/or Toeplitz factors. 

This paper addresses the statistical evaluation of algo¬ 
rithms specialized in computing a CPD having banded cir¬ 
culant factors when applied to estimate the parameters of a 
Wiener-Hammerstein (WH) model, which is a well-known 
block-oriented model used for representing nonlinear dynam¬ 
ical systems m. Because many systems of practical rele¬ 
vance can be (approximately) described by the WH model, 
the problem of identifying its parameters from a set of ex¬ 
perimental data (i.e., measured input and output samples) 
is well-studied; see, e.g., M and references therein. One 
possible approach, as described in 0, consists in estimating 
the WH model parameters by computing the structured CPD 
of a kernel of its equivalent Volterra model. Here, we derive 
a closed-form expression for the CRB associated with this 
estimation problem, assuming the availability of a previously 
identified Volterra kernel corrupted by white Gaussian noise. 
Then, we formulate specialized estimation algorithms based 
on the CP Toeplitz (CPTOEP) and circulant-constrained AES 
(CALS) methods proposed in Il8]|9l and evaluate their perfor¬ 
mance by comparing their mean-square error with the CRB 
through Monte Carlo simulations. 

Notation. Scalars are denoted by lowercase letters, e.g. di 



or Uij, vectors by lowercase boldface, e.g. 9 or a^, matrices 
by boldface capitals, e.g. B or and higher order arrays 
by calligraphic letters, e.g. X. We use the superscripts ^ for 
transposition, ^ for pseudo-inverse, Kl and 0 denote the Kro- 
necker and Khatri-Rao products, respectively, and 0 stands 
for the (tensor) outer product. The shorthand denotes 
aK...Kla, where a appears p times; a®^ and A®^’ are defined 
analogously. For our purposes, a tensor X of order P will be 
assimilated to its array of coordinates, which is indexed by P 
indices. Its entries will be denoted by 


u{n) 


W{z) 


9 {-) 


H(z) 


y{n) 


Fig. 1. Block-diagram of the Wiener-Hammerstein model. 


whose symmetric discrete-time Volterra kernels are (uniquely) 
given by ina 

L p 

k^P'>{mi,...,nip) = gp'^hiY^Wm^-i, (3) 

l=lo 9=1 


2. WIENER-HAMMERSTEIN MODEL 
IDENTIFICATION VIA CPD 


with M = Lyj + R— 1, Iq = max{0, mi —L^ -f 1,..., nip — 
Lw + 1} and L = min{i? — 1, mi,..., nip}. 


2 . 1 . Tensors and the CP decomposition 

The polyadic decomposition of a pth-order tensor is defined 

by R 

X = ^ a^^^ 0 a^^) 0 ... 0 a^p), (1) 

r—1 

where a^'^^ is the column of The minimal 

value of R such that X can be written as in ([Til is called the 
rank of X, in which case we refer to the above decomposition 
as the CPD of X. Another way of expressing ([T]i is 

X = a Xi A(i) X2-- - XpA(p), 

where J G ■■■ xrt ^ pth-order diagonal tensor such that 
P]r,...,r = land X g denotes the mode-g product (see, e.^.,||2l 
Sec.’ 2’.5]). 


2.3. CPD-based WH model identification 

We now describe the WH model identification approach pro¬ 
posed in El , which involves computing the CPD of a symmet¬ 
ric high-order Volterra kernel. We start by noting that, being a 
function of multiple discrete indices, any pth-order symmetric 
Volterra kernel of memory M can be uniquely identified 
with a pth-order symmetric tensor X S (jefiuedby 

= k^^\nii — 1,..., nip — 1). Owing to its convo- 
lutive form involving separable terms, the kernel in (O can be 
identified with the tensor 

R R 

X = gp'^hr-icfp = gp'^hr-i{Srw)®^ , (4) 

r—1 r—1 

where = [e,. ... with denoting the mth 

canonical basis vector of M.^, and w = [wq ... 

Expression (jH) is a symmetric CPD that can also be written as 


2.2. The WH model and its equivalent Volterra model 


The structure of a discrete-time WH model is as depicted in 
Fig.H] Basically, it consists of a cascade connection compris¬ 
ing a memoryless nonlinearity g{-) “sandwiched” by two lin¬ 
ear systems, W{z) and H{z). Because of its structured form 
constituted by fundamental blocks, the WH model is said to 
belong to the class of block-oriented models ifTTll . 

In this paper, we consider the time-invariant WH model 
constituted by a polynomial nonlinearity g{x) = J2p=i 
and by finite impulse response filters W{z) = ^ 

with Wo ^ 0, and H{z) = hrZ~^. Hence, the result¬ 

ing expression relating the input u{n) to the output y{n) is 


R-l 


y{n) = '^9p'^hr 




r=0 


'Lxy+r —1 

E 


Wm-rU{n — m) 


( 2 ) 


After some manipulation, this relation can be put in the equiv¬ 
alent Volterra model form 


P M-l M-1 p 

2/(’^)=E E ■■■ E 


p—1 mi—0 rrip—O 


9=1 


X = 3 X 1 C X 2 ■ • • Xp_i C Xp [gpC diag(h)], (5) 

where h = [/iq ■ • ■ and C = [ci ... c/j] G 

jsfQjg jjjg choice of which factor is postmultiplied 
by diag(h) is irrelevant, due to the scaling indeterminacy. We 
thus conclude that the WH model (|2]i has equivalent symmet¬ 
ric Volterra kernels whose CPD are constituted by circulant 
factors C and a factor of the form ppCdiag(h), which ab¬ 
sorbs the scaling coefficients gp and hr. 

As the factors in (|5]l contain the parameters of the linear 
blocks of the WH model (|2]l, the above observations suggest 
the following three-step procedure for its identification; (i) es¬ 
timate k^P'> from an available set of input/output samples, us¬ 
ing some Volterra kernel identification method (as, e.g., El), 
(ii) compute the structured CPD from the associated symmet¬ 
ric tensor X and (iii) estimate the coefficients gq, q ^ p, in 
the least-squares sense as explained in El- Note that this re¬ 
quires choosing some p > 3, since otherwise the model is 
not identifiable; for p = 1, it is a vector containing sums of 
products of coefficients gp, hr and wf, for p = 2, we have a 
bilinear decomposition, which is only unique under restrictive 
assumptions (such as orthogonality). Henceforth, we assume 
that (i) has been accomplished and focus on step (ii). 











3. ANALYTICAL CRB FOR CPD-BASED WH 
ESTIMATION ALGORITHMS 


3.1. Formulation of estimation problem 

Let us consider that a pth-order tensor has been constructed 
from a non-null estimated kernel k^P\ as described in the 
previous section. In practice, it is evident that such a ten¬ 
sor satisfies y = X + ]\f, where 3\f is an error tensor 
accounting for the inevitable uncertainties which arise in 
the data-driven kernel estimation procedure. Furthermore, 
since {nil, ■ ■ ■ ,rnp) is symmetric in mi,..., nip, in 
practice one estimates only the elements whose indices 
pertain to a suitable non-redundant domain such as D = 
{(mi,..., mp) : mi < ■ • • < m^}, determining the others 
by symmetry. Hence, y and AT are also pth-order sym¬ 
metric tensors, containing redundant elements. Introduc¬ 
ing the selection matrix IF G where I = \D\ = 

which contains as row^ every product of the form 
Kl ... Kl for (mi,..., mp) G D, we can write the 
(non-redundant) vectorized model 

y = ’4'vecCy) = x -I- n e 

where x = \Fvec(X) and n = lI'vec(Af) is a random vector. 
Now, from (IHi, we can deduce 


R 


R 


Vec(X) = Pp hr-l (SrW)^^ = 

-K.- 1 

gpY,hr-iSfP 

T* — 1 


R 




Pp'^hr-l^r 

r—1 


#(h)f(w), 


(6) 


where $(h) is given by the term between brackets, in which 
= S^P, and f(w) = w®p. 

Our problem can therefore be expressed as that of estimat¬ 
ing the parameters gp, w and h of the WH model from obser¬ 
vations which satisfy y = ’®'$(h)f(w) + n. We assume that 
the random vector n has zero-mean i.i.d. components drawn 
from the Gaussian distribution with variance a^. 


3.2. Identiliability 

Due to the inherent scaling indeterminacy of our model, its lo¬ 
cal identifiability is only guaranteed with further assumptions. 
To eliminate this indeterminacy, we assume wq = Qp = 1, 
which is sufficient due to the model structure. Note that this 
entails no loss of generality, as h and the other coefficients 
wi can be rescaled accordingly. Defining now w such that 
w = [1 w^]^, we can write the parameter vector of the WH 
model as ry = [w^ G R^. Global identifiability, on 

the other hand, is related to the uniqueness of the structured 
CPD. As the Ic-rank |l2][3l of C equals R, uniqueness follows 
from Kruskal’s condition ^ Sec. 3.2] if ||h||o = R (which 

*The ordering of the rows of ^ is of no consequence for our purposes. 


implies that the fc-rank of C diag(h) is R) and R > 2. If 
||h||o < R, then the fc-rank of C diag(h) equals zero; in this 
case, Kruskal’s condition is only met for P = 4 if i? > 3 and 
for P > 5 if i? > 2. 


3.3. Parameter estimation algorithms 

In this section, we briefly review two methods that can be 
used to estimate the parameters 77 of a model of the form (|4]i. 

3.3.1. Circulant-constrained ALS algorithm 

The first method consists of a specialization of the well- 
known alternating least squares (ALS) algorithm in which 
the factor matrices of the CPD are constrained as in (|5]l. In 
the case of a CPD involving only circulant factors, such strat¬ 
egy has already been followed in ||9l, leading to the CALS 
algorithm. Here, we adapt that algorithm for our purposes. 

Initially, we define E/ = [ej ... for 

/£{!,..., P„}, and E = [vec (El) ... vec (El,^)]. With 
these definitions, we have vec(C) = Ew. Next, we note that 
any flat matrix unfolding of y can then be written as 

Yft! Cdiag(h) (C®P-i)^= unwecR (Ew) diag(h) 


where the above approximation is due to the presence of 
noise and the operator unvec^; is defined such that, V a = 
... al)]^ with a^ G unvec 7 {(a) = [ai ... a/j]. 
Using the property vec(A diag(b)D) = (D^ © A)b, we 
have also vec(Y) « (C®^) h. Hence, given current esti¬ 
mates and h^, we can update them with the scheme 


(i) = ;^E^vec Y ( Wfc' 


diag (h'') 


-1 


(ii) = --- 

[vfc+i]/ ’ 

(iii) 0 C'^+^y vec(Y), 


where C'' = [Siw'' ... and = (C'=)®p-i. 

Note that, to derive (i), we have used E'l' = (1/P) E^. 

As stopping criteria, one can check whether the relative 

difference between two consecutive values of the reconstruc- 

2 


tion error Jy = 


y-jxiC^X2 


■■ XpC‘ 


falls below some fixed threshold e 
ber of iterations /Cmax is attained. 


• diag 


> 0 or a maximum num- 


3.3.2. CPTOEP algorithm 

Since the objective is multimodal, the main goal is to And a 
good approximation of the solution by using a low-complexity 
algorithm. In HI, non-iterative procedures have been pro¬ 
posed, which are able to compute the exact CPD when matrix 
factors are banded or structured. Consider a matrix unfold¬ 
ing ofy under the form: Y (C^^^ 0 C(^))A^, where the 














structure of A = © ... © is ignored, and where 

(jM are assumed Toeplitz circulant of same size M x R, 
that is, they can each be expressed in the orthonormal basis 
{Ei, 1 < i < Lu)} defined in Section [3.3.1l 

CW=^cf)E,, ne p}. 

e=i 

Let Y = USV^ denote the SVD of Y. Then there exists a 
matrix N such that UN = (C^^) © C^^)) and N-^SV^ = 
A^. Following the lines of HI, one can find matrix N and co¬ 
efficients Zij = by solving a linear system of M^R 

equations in -f — 1 unknowns. If there are more equa¬ 
tions than unknowns and if the system has full rank R, the 
solution (N, Z) is unique. First, coefficients and 
are obtained from the best rank-1 approximation of matrix 
Z, which eventually yields estimates and Next, 

we calculate C = -f C(^^)/2, and the estimate of h is 

obtained as in stage (iii) of the CALS algorithm. 

The algorithm described above is suboptimal for several 
reasons: (a) the model is noisy, (b) the p factor matrices are 
assumed to be independent, whereas they are not, and (c) the 
structure of A is ignored. Flence the solution obtained will 
be inaccurate, but can be easily refined by a quasi-Newton 
algorithm, as will be subsequently shown. 


leading thus to 

— = (F(w) Kl'5') [vec($i) ... vec ($/{)] . 

In order to identify the contribution of w and h in 
CRB(zI;fe) and CKB{hr), we propose to extend the results 
presented in ifTSll by using oblique projection. This is the 
purpose of the following proposition. We denote by Eab the 
oblique projection whose range is (A) and whose null space 
contains (B) (see ifTTl for details). 

Proposition 3.1 The closed-form expression for the CRB of 
Wk is given by: 

^2 

" tel|2-||EG,J(h)gfe|P-||Ej(h)G.gfe|P’ 

where gk is the kth column o/J(w) and Gk is the submatrix 
o/J(w) obtained by removing its kth column. Similarly, the 
closed-form expression for the CRB of hr is: 

^2 

" l|d.P-||EG^j(W)d,|P-||Ej(w)DAP’ 

where d,. is the rth column o/J(h) and is the submatrix 
of 3(h) obtained by removing its rth column. 

The proof is omitted due to the lack of space. 


3.4. Closed-form expression for the CRB 


4. SIMULATION RESULTS 


If we assume that r] contains deterministic parameters asso¬ 
ciated with a system of interest, we have that the (vectorized) 
measured kernel satisfies y ^ AA(x, cr^I/), where denotes 
the variance of the elements of n. Hence, the mean-square 
error (MSE) of any locally unbiased estimator f)(y) satisfies 

Lm-l R 

E{||r7-f)(y)f I > ^ CRB(wfe)+^CRB(/i^), 

k—1 r—1 

"-V-" 

trace(B(77)) 


where the CRB matrix B(rj) can be computed by applying 
the Slepian-Bangs formula, which yields O 

B(?7) = 0-2 (j(^)Tj(^)) 

where J(rj) G is the Jacobian matrix given by 


J(t 7 ) = [J(w) J(h)] 


dx dx 
dw dh 


From (|6l) and the definition of f, we have 
(9x 

— ='I>$(h)— =vp$(h)[zi(w) ... 


in which z;(w) = Y7q=i ^ ^ (with the 

convention = 1). To derive J(h), we first apply the 
property vec(ABD) = (D"^ Kl A)vec(B) to write 


X = vec(^$(h)f (w)) = (f^(w) lEI ’S') vec($(h)), 


To illustrate the utility of the derived CRB, we now present 
some Monte Carlo simulation results. Specifically, we evalu¬ 
ate several estimators when applied to identify a WH model 
with parameters = [1 0.538 1.834 -2.259 0.862]'^, 
h = [1.594 -6.538 -2.168]^ from estimates of the equivalent 
symmetric third-order kernel A, proceeding as follows. For 
each realization of the (symmetric) noise tensor N, we vary 
cr^ and then construct a data tensor ^ = X -f N for each cho¬ 
sen level of a^. Next, we compute estimates ri(y) G given 
by: (i) the family of estimators A-CALS, which consist in 
applying N times the algorithm of Section [3.3.1l with random 
initializations and keeping the best solution in terms of recon¬ 
struction error (w.r.t. V); (ii) the estimator CPTOEP, described 
in Section 13.3.21 (iii) the estimator CPTOEP-CALS, which 
corresponds to refining the CPTOEP estimate by applying 
the CALS algorithm; (iv) the estimator CPTOEP-BEGS, 
in which a similar refinement is obtained by minimizing a 
least-squares criterion (w.r.t. y) with the Broyden-Eletcher- 
Goldfarb-Shanno (BEGS) algorithirH ifTSl . The maximum 
number of iterations established for CALS and BEGS is 
Amax = 2000. We choose ey = 10“^° and set the tolerance 
of BEGS also as 10“^°. Eor each estimate f)(y), we compute 
~ 11'^ ~ '^(y)ll^- This procedure is repeated for 100 re¬ 
alizations of Af and then is averaged for each level of 
yielding a mean-square error estimate denoted by MSE^. 

^Specifically, we used the Fortran implementation whose Matlab interface 
is available at http: / /github. com/pcarbo/lbf gsb-mat lab. 














Table 1. Simulation results: estimated MSE,, values (in dB). 


1 /o-^ (dB) 


Estimator 

10 

20 

30 

40 

50 

60 

t-CALS 

19.22 

17.14 

18.37 

17.68 

18.53 

17.86 

5-CALS 

-15.04 

-25.05 

4.04 

4.05 

-55.07 

4.06 

tO-CALS 

-15.04 

-25.05 

-35.04 

-45.07 

-55.02 

-65.07 

CPTOEP 

-13.96 

-23.94 

-33.94 

-43.94 

-53.94 

-63.94 

CPTOEP-CALS 

-15.04 

-25.04 

-35.04 

-45.05 

-55.02 

-65.13 

CPTOEP-BEGS 

-20.04 

-30.03 

-40.03 

-50.02 

-60.01 

-69.62 

CRB 

-20.18 

-30.18 

-40.18 

-50.18 

-60.18 

-70.18 


The results are shown in Table [T] as well as the computed 
values of the CRB. One can see that 1-CALS has a very poor 
performance, due to its frequent premature termination or in¬ 
ability to converge. Although 5-CALS performs better, its re¬ 
sults are degraded for the same reasons. CPTOEP, in its turn, 
performs slightly worse than 10-CALS, but attains a similar 
level when refined by CALS. Yet, there remains a gap be¬ 
tween their MSE curves and that of the CRB. Indeed, only 
CPTOEP-BGES attains an MSE close to the CRB. Note that 
a similar gap has been reported by im for the AES algorithm. 
Along the lines of their discussion, we believe that, in the case 
of CALS, this gap is due to the convergence problems which 
are always observed in practice, at least for a few runs. As for 
CPTOEP, this seems to happen because the adapted procedure 
yields suboptimal estimates. 

Einally, we note that the above comparison is justified 
since, under the assumption of Gaussian additive noise, the 
least-squares criterion leads to the maximum likelihood (ML) 
estimator. In signal-in-noise problems, the ML estimator is 
often approximately unbiased even for a small sample size, 
provided that the SNR is sufficiently high M- 

5. CONCLUSION 

A closed-form expression of the CRB has been derived for 
the parameter estimates of a CPD having identical banded 
circulant factors, one of which is post-multiplied by a diag¬ 
onal scaling matrix. Then, two specialized algorithms have 
been proposed to compute a CPD with that structure. The 
first, named CALS, is an adaptation of the AES method taking 
the structural constraints into account, whereas the second is 
composed of two steps: (i) compute an approximate solution 
thanks to a non iterative algorithm (CPTOEP), and (ii) refine 
the solution via CALS or via a quasi-Newton descent (BEGS). 
The latter (CPTOEP-BEGS) reached the Cramer-Rao bound 
over a wide range of SNR values. The proposed algorithms 
have been applied to identify a WH model, and their statistical 
performance has been evaluated using the derived CRB. 

REFERENCES 

[1] P. Comon, “Tensors : A brief introduction,” IEEE Signal 
Process. Mag., vol. 31, no. 3, pp. 44-53, 2014. 


[2] T. Kolda and B. Bader, “Tensor decompositions and applica¬ 
tions,” SIAM Review, vol. 51, no. 3, pp. 455-500, 2009. 

[3] L.-H. Lim and P. Comon, “Blind multilinear identification,” 
IEEE Trans. Inf. Theory, vol. 60, no. 2, pp. 1260-1280, 2014. 

[4] C. E. R. Fernandes, G. Favier, and J. C. M. Mota, “Blind chan¬ 
nel identification algorithms based on the PARAFAC decom¬ 
position of cumulant tensors: the single and multiuser cases,” 
Signal Process., vol. 88, no. 6, pp. 1382-1401, 2008. 

[5] G. Favier and A. Y. Kibangou, “Tensor-based methods for 
system identification. Part 2: Three examples of tensor-based 
system identification methods,” Int. J. Sci. Tech. Automat. 
Control (IJ-STA), vol. 3, no. 1, pp. 870-889, 2009. 

[6] M. Sorensen and L. De Fathauwer, “Blind signal separation 
via tensor decomposition with Vandermonde factor: Canon¬ 
ical polyadic decomposition,” IEEE Trans. Signal Process., 
vol. 61, no. 22, pp. 5507-5519, 2013. 

[7] A. Y. Kibangou and G Favier, “Non-iterative solution for 
PARAFAC with a Toeplitz matrix factor,” in Proc. EUSIPCO, 
Glasgow, UK, 2009, pp. 691-695. 

[8] M. Sorensen and P. Comon, “Tensor decompositions with 
banded matrix factors,” Linear Alg. AppL, vol. 438, no. 2, pp. 
919-941, 2013. 

[9] J. H. de M. Goulart and G. Favier, “An algebraic solution 
for the Candecomp/PARAFAC decomposition with circulant 
factors,” SIAM J. Matrix Anal AppL, vol. 35, no. 4, pp. 1543- 
1562, 2014. 

[10] S. M. Kay, Fundamentals of Statistical Signal Processing: 
Estimation Theory, Fundamentals of Statistical Signal Pro¬ 
cessing. PTR Prentice-Hall, 1993. 

[11] X. Liu and N. D. Sidiropoulos, “Cramer-Rao lower bounds for 
low-rank decomposition of multidimensional arrays,” IEEE 
Trans. Signal Process., vol. 49, no. 9, pp. 2074-2086, 2001. 

[12] S. Sahnoun and P. Comon, “Tensor polyadic decomposition 
for antenna array processing,” in Proc. Int. Conf. Comput. 
Stat. (Compstat), Geneva, Switzerland, 2014, pp. 233-240. 

[13] M. Boizard, R. Boyer, G. Favier, and P. Comon, “Performance 
estimation for tensor CP decomposition with structured fac¬ 
tors,” in Proc. ICASSP, 2015. 

[14] R. Haber and L. Keviczky, Nonlinear system identification 
- Input-Output Modeling Approach, vol. 1 of Mathematical 
Modelling, Kluwer Academic Publishers, 1999. 

[15] A. Y. Kibangou and G. Favier, “Wiener-Hammerstein systems 
modeling using diagonal Volterra kernels coefficients,” IEEE 
Signal Process. Lett., vol. 13, no. 6, pp. 381-384, 2006. 

[16] A. Y. Kibangou and G. Favier, “Identification of fifth-order 
Volterra systems using i.i.d. inputs,” lET signal process., vol. 
4, no. 1, pp. 30-44, 2010. 

[17] R. T. Behrens and L. L. Scharf, “Signal processing applica¬ 
tions of oblique projection operators,” IEEE Trans. Signal 
Process., vol. 42, no. 6, pp. 1413-1424, 1994. 

[18] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, “A limited memory 
algorithm for bound constrained optimization,” SIAM J. Sci. 
Comput., vol. 16, no. 5, pp. 1190-1208, 1995. 













